SparkTrails: A MapReduce Implementation of HypTrails for Comparing Hypotheses About Human Trails
نویسندگان
چکیده
HypTrails is a bayesian approach for comparing different hypotheses about human trails on the web. While a standard implementation exists, it exposes performance issues when working with large-scale data. In this paper, we propose a distributed implementation of HypTrails based on Apache Spark taking advantage of several structural properties inherent to HypTrails. The performance improves substantially. Our implementation is publicly available.
منابع مشابه
Photowalking the City: Comparing Hypotheses About Urban Photo Trails on Flickr
Understanding human movement trajectories represents an important problem that has implications for a range of societal challenges such as city planning and evolution, public transport or crime. In this paper, we focus on geotemporal photo trails from four different cities (Berlin, London, Los Angeles, New York) derived from Flickr that are produced by humans when taking sequences of photos in ...
متن کاملDiscovering and Characterizing Mobility Patterns in Urban Spaces: A Study of Manhattan Taxi Data
Nowadays, human movement in urban spaces can be traced digitally in many cases. It can be observed that movement patterns are not constant, but vary across time and space. In this work, we characterize such spatio-temporal patterns with an innovative combination of two separate approaches that have been utilized for studying human mobility in the past. First, by using non-negative tensor factor...
متن کاملComparing Hypotheses About Sequential Data: A Bayesian Approach and Its Applications
Sequential data can be found in many settings, e.g., as sequences of visited websites or as location sequences of travellers. To improve the understanding of the underlying mechanisms that generate such sequences, the HypTrails approach provides for a novel data analysis method. Based on first-order Markov chain models and Bayesian hypothesis testing, it allows for comparing a set of hypotheses...
متن کاملUnderstanding How Users Edit Ontologies: Comparing Hypotheses About Four Real-World Projects
Ontologies are complex intellectual artifacts and creating them requires significant expertise and effort. While existing ontology-editing tools and methodologies propose ways of building ontologies in a normative way, empirical investigations of how experts actually construct ontologies “in the wild” are rare. Yet, understanding actual user behavior can play an important role in the design of ...
متن کاملComparing Distributed Indexing: To MapReduce or Not?
Information Retrieval (IR) systems require input corpora to be indexed. The advent of terabyte-scale Web corpora has reinvigorated the need for efficient indexing. In this work, we investigate distributed indexing paradigms, in particular within the auspices of the MapReduce programming framework. In particular, we describe two indexing approaches based on the original MapReduce paper, and comp...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016